Videogames are one of my favorite past times, and the competitive scene’s development has always intrigued me. One of the most notable games to establish a global competitive scene back by paid professionals was the real-time strategy game (RTS) Starcraft II (SC2). In 2003 the top 10 starcraft players by earnings made nearly four-million dollars combined from their winnings alone (“Winnings: 2019 - Liquipedia - the Starcraft Ii Encyclopedia” n.d.). Watching top-caliber players reflexes and control astonishing is even seasoned videogame enthusiasts. At the 2019 StarCraft II World Championship Finale, I and many others packed into the arena to see what these professions could do firsthand.
‘(“Congrats to the Starcraft Ii Wcs Global Finals Champion! - Blizzcon” n.d.)’
The eye watering speeds they perform at is typically referenced in gaming terminology as actions per minute (APMS). Professionals take actions at such fastspeeds (high APMS) it becomes challenging to follow their overall strategy. So past pondering their sheer speed, I found it difficult to distinctly define what made these players professionals.
To learn more about what defines talent in SC2 this analysis we will explore in game metrics in attempts to predict their rank in competitive mode. The dataset used was provided by ‘(“UCI Machine Learning Repository: SkillCraft1 Master Table Dataset Data Set” n.d.)’.
To model the response LeagueIndex we will explore a sample of player data from a 2013 ranked season of Starcraft. The predictors provided summarize in game performance metrics for a season by player (GameID). The modeling process will consider all the predictor variables and then trim down until only significant predictors remain. Variables will be vetted for confoundment, and finally The model will be explored for to see if the BLUE assumptions hold.
The multivariate regression model used for the midterm 2 portions of this study explores the linear estimation of mean response of LeagueIndex estimated by predictor \(X\). The validity of this models approximations assumes LeagueIndex is gaussian distributed and can vary continuously but LeagueIndex is a ordinal variable. The levels of LeagueIndex range (1:8) corresponding to player ranks Bronze, Silver, Gold, Platinum, Diamond, Master, Grand Master, and Professional league. Game ranking systems frequently are based on ELO/MMR that varies over a much larger range typically ~1200-3000. These ranges are then masked with medal ranks as listed above and then further subdivided into divisions within each medal [Add Source]. Using either MMR or having the subdivisions of each player would provide some much needed continuity but unfortunately neither of these metrics are available. These limitation will be revisited more specifically along the exploration, modeling, and the predictions of the values.
A more suitable form of model for this regression would be based off a Polytonomous Logistic Regression for Ordinal Response (Proportional Odds Model) (“Ordinal Logistic Regression | R Data Analysis Examples” n.d.) .
This dataset is a sample of Starcraft II players who participate in 2013 ranked play. The variables are as follows:
colnames(sc)
## [1] "GameID" "LeagueIndex" "Age"
## [4] "HoursPerWeek" "TotalHours" "APM"
## [7] "SelectByHotkeys" "AssignToHotkeys" "UniqueHotkeys"
## [10] "MinimapAttacks" "MinimapRightClicks" "NumberOfPACs"
## [13] "GapBetweenPACs" "ActionLatency" "ActionsInPAC"
## [16] "TotalMapExplored" "WorkersMade" "UniqueUnitsMade"
## [19] "ComplexUnitsMade" "ComplexAbilitiesUsed"
The appendix covers each in depth but the following are highlighted because their used in the final analysis.
LeagueIndex is was covered thoroughy in the above section.
Perception Action Cycles (PACs) are the circular flow of information between an organism and its environment where a sensor-guided sequence of behaviors are iteratively followed towards a goal (???) this dataset PACs are aggregate of screen movements where PAC is a screen fixation of containing at least one action (“UCI Machine Learning Repository: SkillCraft1 Master Table Dataset Data Set” n.d.).
Some of the time averaged metrics are per SC2 timestamp while other are per milisecond. To make these metrics more interpretable each metric will be converted into milliseconds. There are roughly 88.5/1000 timestamps per second so each metric in timestamps will be multiplied that as a coefficient (“UCI Machine Learning Repository: SkillCraft1 Master Table Dataset Data Set” n.d.).
The missing values are related exclusive to players with LeagueIndex equivalent to Professional Players (8). Where the 55 players with LeagueIndex=8 the age data is NA and the HoursPerWeek are 0. LeagueIndexes 1-7 are obtainable natural game play, to be a professional you would have to be part of a team. I am aiming to understand how players go from being average to good, less so elite to best so the NA associated with professionals will be dropped.
Question for Colin: Furthermore while Bronze through Master leagues (LeagueIndex’s 1-6) may contain any number of players, Grand Master (LeagueIndex=7) may only contain 200 players. To only analyze portions of the data that are “more normal” the 55 players LeagueIndex==7 data will be dropped.
The following will call the columns that are problematic for the analysis.
## [1] "Age Values"
## missing
## "55"
## [1] "LeagueIndex Values"
## missing
## "0"
## [1] "Hours Per Week Lower Extreme Values"
## named numeric(0)
summary(sc)
<<<<<<< HEAD:skillcraft_regression.html
Correlation Plot
sc_cor<-cor(select_if(sc,is.numeric),use = "complete.obs")
sc_cor_plot<-corrplot(sc_cor,
tl.cex=.75,
tl.col='black')
## GameID LeagueIndex Age HoursPerWeek
## Min. : 52 Min. :1.00 Min. :16.00 Min. : 2.00
## 1st Qu.:2423 1st Qu.:3.00 1st Qu.:19.00 1st Qu.: 8.00
## Median :4789 Median :4.00 Median :21.00 Median : 12.00
## Mean :4720 Mean :4.12 Mean :21.65 Mean : 15.91
## 3rd Qu.:6995 3rd Qu.:5.00 3rd Qu.:24.00 3rd Qu.: 20.00
## Max. :9271 Max. :7.00 Max. :44.00 Max. :168.00
## TotalHours APM SelectByHotkeys AssignToHotkeys
## Min. : 3.0 Min. : 22.06 Min. :0.000000 Min. :0.0000000
## 1st Qu.: 300.0 1st Qu.: 79.23 1st Qu.:0.001247 1st Qu.:0.0002017
## Median : 500.0 Median :107.07 Median :0.002447 Median :0.0003487
## Mean : 960.6 Mean :114.58 Mean :0.004024 Mean :0.0003642
## 3rd Qu.: 800.0 3rd Qu.:140.18 3rd Qu.:0.004947 3rd Qu.:0.0004929
## Max. :1000000.0 Max. :389.83 Max. :0.043088 Max. :0.0016483
## UniqueHotkeys MinimapAttacks MinimapRightClicks NumberOfPACs
## Min. : 0.000 Min. :0.000e+00 Min. :0.0000000 Min. :0.000679
## 1st Qu.: 3.000 1st Qu.:0.000e+00 1st Qu.:0.0001388 1st Qu.:0.002743
## Median : 4.000 Median :3.866e-05 Median :0.0002785 Median :0.003377
## Mean : 4.317 Mean :9.381e-05 Mean :0.0003803 Mean :0.003434
## 3rd Qu.: 6.000 3rd Qu.:1.136e-04 3rd Qu.:0.0005078 3rd Qu.:0.004004
## Max. :10.000 Max. :3.019e-03 Max. :0.0036877 Max. :0.007971
## GapBetweenPACs ActionLatency ActionsInPAC TotalMapExplored
## Min. : 6.667 Min. : 24.63 Min. : 2.039 Min. : 5.00
## 1st Qu.: 29.326 1st Qu.: 50.87 1st Qu.: 4.261 1st Qu.:17.00
## Median : 37.057 Median : 61.29 Median : 5.087 Median :22.00
## Mean : 40.710 Mean : 64.21 Mean : 5.267 Mean :22.12
## 3rd Qu.: 48.506 3rd Qu.: 74.03 3rd Qu.: 6.027 3rd Qu.:27.00
## Max. :237.143 Max. :176.37 Max. :18.558 Max. :58.00
## WorkersMade UniqueUnitsMade ComplexUnitsMade ComplexAbilitiesUsed
## Min. :7.698e-05 Min. : 2.000 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:6.818e-04 1st Qu.: 5.000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :9.042e-04 Median : 6.000 Median :0.000e+00 Median :2.047e-05
## Mean :1.031e-03 Mean : 6.542 Mean :6.000e-05 Mean :1.420e-04
## 3rd Qu.:1.258e-03 3rd Qu.: 8.000 3rd Qu.:8.743e-05 3rd Qu.:1.824e-04
## Max. :5.149e-03 Max. :13.000 Max. :9.023e-04 Max. :3.084e-03
When using Shapiro-Wilk W test test on response LeagueIndex we find that we can reject the idea that the response come from a normally distributed population.
Besides the obvious issues with performing a W test with an ordinal response, the the response has a heavy tail with a mean of 4.12. Additionaly we cannot assume the levels between LeagueIndexes are uniforming spaced.
[Should I Include:] While the population distribution of ranking is difficult to come by, leagues like Grandmaster (LeagueIndex = 7 ) are capped at 200 which forcibly prevents LeagueIndex 1-7 from having a gaussian distribution.
qqnorm(scale(sc$APM))
Stepwise linear regression:
The function summary() reports the best set of variables for each model size. From the output above, an asterisk specifies that a given variable is included in the corresponding model (“Stepwise Regression Essentials in R - Articles - Sthda” n.d.).
##
## Call:
## lm(formula = LeagueIndex ~ ActionLatency + AssignToHotkeys +
## APM + MinimapAttacks, data = sc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2283 -0.6550 0.0370 0.7148 2.8292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.863e+00 1.387e-01 35.053 <2e-16 ***
## ActionLatency -3.108e-02 1.337e-03 -23.252 <2e-16 ***
## AssignToHotkeys 1.082e+03 9.968e+01 10.858 <2e-16 ***
## APM 6.596e-03 5.577e-04 11.827 <2e-16 ***
## MinimapAttacks 1.099e+03 1.132e+02 9.709 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.008 on 3332 degrees of freedom
## Multiple R-squared: 0.5159, Adjusted R-squared: 0.5153
## F-statistic: 887.7 on 4 and 3332 DF, p-value: < 2.2e-16
##
## Correlation of Coefficients:
## (Intercept) ActionLatency AssignToHotkeys APM
## ActionLatency -0.94
## AssignToHotkeys -0.19 0.13
## APM -0.76 0.63 -0.31
## MinimapAttacks -0.01 0.01 -0.11 -0.10
##
## Shapiro-Wilk normality test
##
## data: residuals(sc_lm_r1)
## W = 0.99628, p-value = 2.262e-07
## [1] 1.082658e-26 3.372155e-61 1.372301e-01 0.000000e+00 2.289001e-31
## [6] 3.646539e-53 3.089689e-09 2.612363e-17 2.099473e-03 1.099522e-48
## [11] 4.031877e-33 6.399003e-26 2.361646e-02 1.431082e-02 2.228961e-10
## [16] 2.473570e-01 1.687128e-02 NA
5 Justify your choices using hypothesis testing and confidence intervals for selection of parameters. You should compare multiple model choices in describing this relationship, including the null model.
6 Evaluate the goodness of fit of the model in terms of R2 and the standard errors, and the major sources of uncertainty. This includes parameter uncertainty, as well as structural uncertainty in the model.
An all inclusive model was made with with every predictor. Predictors were then removed by descending t values until only significant predictors remained.
Our intial \(H_o\) is there is no systematic structure to the response LeagueIndex. Our alternative \(H_a\) is there is some relation such that \(LeagueIndex=X\beta+\epsilon\), where X all other variables with the exception of index GameID. Without much surpise using18 predictor variable results of a very small p-value of ~0. Over the interration this does not change in a notable fashion across the 9 other models as the last model also results in a p-value0. Thus all iterations of the model we can reject the null hypothesis.
Performing an ANOVA test we find that there is a significant difference in the models (see table below), but the \(adjR^2\) is barely different. Starting at 0.54 and ending at 0.53.
confint(sc_lm_9)
## 2.5 % 97.5 %
## (Intercept) 4.017199802 4.832767e+00
## HoursPerWeek 0.004647965 1.043038e-02
## SelectByHotkeys 30.190396835 4.672276e+01
## AssignToHotkeys 637.028609358 1.039799e+03
## UniqueHotkeys 0.013448694 4.574632e-02
## MinimapAttacks 819.988474955 1.257408e+03
## NumberOfPACs 100.568556224 2.259750e+02
## GapBetweenPACs -0.014261027 -8.745618e-03
## ActionLatency -0.026515240 -1.904575e-02
## WorkersMade 184.377889515 3.206924e+02
## Analysis of Variance Table
##
## Model 1: LeagueIndex ~ (GameID + Age + HoursPerWeek + TotalHours + APM +
## SelectByHotkeys + AssignToHotkeys + UniqueHotkeys + MinimapAttacks +
## MinimapRightClicks + NumberOfPACs + GapBetweenPACs + ActionLatency +
## ActionsInPAC + TotalMapExplored + WorkersMade + UniqueUnitsMade +
## ComplexUnitsMade + ComplexAbilitiesUsed) - GameID
## Model 2: LeagueIndex ~ Age + HoursPerWeek + TotalHours + APM + SelectByHotkeys +
## AssignToHotkeys + UniqueHotkeys + MinimapAttacks + MinimapRightClicks +
## NumberOfPACs + GapBetweenPACs + ActionLatency + ActionsInPAC +
## TotalMapExplored + WorkersMade + UniqueUnitsMade + ComplexUnitsMade
## Model 3: LeagueIndex ~ Age + HoursPerWeek + TotalHours + APM + SelectByHotkeys +
## AssignToHotkeys + UniqueHotkeys + MinimapAttacks + NumberOfPACs +
## GapBetweenPACs + ActionLatency + ActionsInPAC + TotalMapExplored +
## WorkersMade + UniqueUnitsMade + ComplexUnitsMade
## Model 4: LeagueIndex ~ Age + HoursPerWeek + APM + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + ActionsInPAC + TotalMapExplored + WorkersMade +
## UniqueUnitsMade + ComplexUnitsMade
## Model 5: LeagueIndex ~ Age + HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + ActionsInPAC + TotalMapExplored + WorkersMade +
## UniqueUnitsMade + ComplexUnitsMade
## Model 6: LeagueIndex ~ Age + HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + ActionsInPAC + TotalMapExplored + WorkersMade +
## ComplexUnitsMade
## Model 7: LeagueIndex ~ Age + HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + ActionsInPAC + TotalMapExplored + WorkersMade
## Model 8: LeagueIndex ~ Age + HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + TotalMapExplored + WorkersMade
## Model 9: LeagueIndex ~ Age + HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + WorkersMade
## Model 10: LeagueIndex ~ HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + WorkersMade
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3318 3220.3
## 2 3319 3220.6 -1 -0.2846 0.2932 0.58821
## 3 3320 3221.0 -1 -0.3934 0.4053 0.52440
## 4 3321 3222.0 -1 -1.0234 1.0544 0.30456
## 5 3322 3223.5 -1 -1.4505 1.4945 0.22160
## 6 3323 3226.1 -1 -2.6115 2.6907 0.10103
## 7 3324 3230.3 -1 -4.2026 4.3301 0.03752 *
## 8 3325 3235.2 -1 -4.8534 5.0006 0.02540 *
## 9 3326 3240.4 -1 -5.2451 5.4042 0.02015 *
## 10 3327 3245.3 -1 -4.9366 5.0863 0.02418 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Exploring the chosen model
summary(sc_lm_9,cor='T')
##
## Call:
## lm(formula = LeagueIndex ~ HoursPerWeek + SelectByHotkeys + AssignToHotkeys +
## UniqueHotkeys + MinimapAttacks + NumberOfPACs + GapBetweenPACs +
## ActionLatency + WorkersMade, data = sc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2841 -0.6475 0.0564 0.6843 2.7732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.425e+00 2.080e-01 21.276 < 2e-16 ***
## HoursPerWeek 7.539e-03 1.475e-03 5.113 3.36e-07 ***
## SelectByHotkeys 3.846e+01 4.216e+00 9.122 < 2e-16 ***
## AssignToHotkeys 8.384e+02 1.027e+02 8.163 4.60e-16 ***
## UniqueHotkeys 2.960e-02 8.236e-03 3.594 0.000331 ***
## MinimapAttacks 1.039e+03 1.115e+02 9.312 < 2e-16 ***
## NumberOfPACs 1.633e+02 3.198e+01 5.105 3.49e-07 ***
## GapBetweenPACs -1.150e-02 1.407e-03 -8.179 4.04e-16 ***
## ActionLatency -2.278e-02 1.905e-03 -11.959 < 2e-16 ***
## WorkersMade 2.525e+02 3.476e+01 7.265 4.64e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9877 on 3327 degrees of freedom
## Multiple R-squared: 0.536, Adjusted R-squared: 0.5348
## F-statistic: 427.1 on 9 and 3327 DF, p-value: < 2.2e-16
##
## Correlation of Coefficients:
## (Intercept) HoursPerWeek SelectByHotkeys AssignToHotkeys
## HoursPerWeek -0.11
## SelectByHotkeys -0.05 -0.13
## AssignToHotkeys -0.10 -0.03 -0.28
## UniqueHotkeys -0.02 0.03 -0.08 -0.24
## MinimapAttacks -0.05 -0.04 -0.02 -0.09
## NumberOfPACs -0.84 -0.03 -0.01 -0.11
## GapBetweenPACs 0.09 0.00 -0.02 0.10
## ActionLatency -0.85 0.04 0.11 0.04
## WorkersMade -0.21 0.02 -0.03 -0.04
## UniqueHotkeys MinimapAttacks NumberOfPACs GapBetweenPACs
## HoursPerWeek
## SelectByHotkeys
## AssignToHotkeys
## UniqueHotkeys
## MinimapAttacks -0.07
## NumberOfPACs -0.15 0.02
## GapBetweenPACs 0.01 0.12 -0.17
## ActionLatency -0.04 0.00 0.71 -0.53
## WorkersMade 0.01 -0.02 -0.05 0.03
## ActionLatency
## HoursPerWeek
## SelectByHotkeys
## AssignToHotkeys
## UniqueHotkeys
## MinimapAttacks
## NumberOfPACs
## GapBetweenPACs
## ActionLatency
## WorkersMade 0.09
plot(sc_lm_9)
Describe your model, how you arrived at it, its goodness of fit, its significance versus other choices of models, and its uncertainty. Describe the predictive power, and the uncertainty. Include relevant tables and figures.
7 Describe your proposed research question for the final. How will you revise your original research question? What issues have you encountered so far? What assumptions do you think you need to (re-)evaluate?
Attribute Information:
“Congrats to the Starcraft Ii Wcs Global Finals Champion! - Blizzcon.” n.d. Accessed October 26, 2020. https://blizzcon.com/en-us/news/23198508/congrats-to-the-starcraft-ii-wcs-global-finals-champion.
“Ordinal Logistic Regression | R Data Analysis Examples.” n.d. Accessed October 27, 2020. https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/.
“Stepwise Regression Essentials in R - Articles - Sthda.” n.d. Accessed October 27, 2020. http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/154-stepwise-regression-essentials-in-r/.
“UCI Machine Learning Repository: SkillCraft1 Master Table Dataset Data Set.” n.d. Accessed October 26, 2020. https://archive.ics.uci.edu/ml/datasets/SkillCraft1+Master+Table+Dataset.
“Winnings: 2019 - Liquipedia - the Starcraft Ii Encyclopedia.” n.d. Accessed October 26, 2020. https://liquipedia.net/starcraft2/Winnings/2019.